perm filename SEMANT[AM,DBL] blob sn#500052 filedate 1980-03-04 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00003 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	About a century ago, a physicist remarked
C00021 00003	Acknowledgements
C00022 ENDMK
C⊗;
About a century ago, a physicist remarked
``The Michaelson interferometer is a marvelous instrument -- in
the hands of Michaelson.''


Similarly, the current stock of representation languages
are impressive aids to the construction of large knowledge-based
programs -- by the originators of those languages.
Trouble comes when the package leaves the factory: one user
will use the ToFill hook to hang procedures that do 99% of his
processing, while a second user will base his whole system around
the peculiarities of the ordering scheme of the system's agenda
mechanism.  User number three assumed the agenda was a LIFO stack,
could never get his programs to use it properly, and finally found
a way around using the agenda completely.   User number four found
that by setting the IfPotentiallyRelevant slots of all his rules to T,
they all seemed to do what he wanted: evaluate the If slot.  User
five found the same kind of simplification, by setting the If slots to T
and packing all the testing into the IfPotentiallyRelevant slots.

Of course the designer knew the ``right'' way to use
each of those features, and got his power amplification from 
judiciously relying on each feature ``when appropriate.''  By
changing the spirit in which each feature was added, the new
users have created two serious problems: 

	INCONVENIENCE:  First, they don't quite
see how this new-fangled system is supposed to help them build up
large programs faster; if anything, they spend extra time trying
to get around the contraints that seem to be tying their hands
(no multiple inheritance; must specify type information).

	INCOMPATABILITY: Second, there is little chance that two
 users working independently
can easily merge their two knowledge bases (``But 
`Definition' obviously refers to the clarity of the image!
You'll have to pick a new name for the slot that stores the
mathematical definition.'').  To alleviate these problems, we propose:


	THE PRINCIPLE OF ACTIVE SEMANTICS
The semantics of the representation should be (i) explicit and
(ii) enforced.  

For instance, the semantics of the Definition slot can be stated
once and
for all (as part of the representation language package that gets exported);
all users will then use it in the same way the creator
intended, and any special features that process Definitions
extraordinarily will work right.
Furthermore, two users' knowledge bases which both have definitions
will both use the same name (Definition) to stand for that relation.

Users will be users, so STATING the semantics is not sufficient;
our principle prescribes
that the system take forceful action to check that
the semantics are being faithfully adhered to, and to raise the
alarm if necessary.  

EXAMPLE 1: Suppose, for instance, that the semantics
of the Examples slot is that it be filled with entries satisfying
the predicate stored in the same unit's Definition slot.
Then, if John Brown adds the list ``(94 - 86 = 12)'' to the Examples
 slot of
the Subtraction unit, it will violate the predicate stored in
the Definition slot of the Subtraction unit, and
hence violate the semantics of the Examples slot.
John may wish for a slot
ReportedExamples,  similar to Examples, whose semantics
is that the entries are answers given by human subjects when
asked for Examples (whose semantics qv).  Thus some erroneous
entries (i.e., entries not legal for Examples) are to be expected.
But if John misuses the Examples slot to store such
entries, then the system should complain (or at
least get apprehensive).  

EXAMPLE 2: 
As another example, assume we represent each rule as a full-fledged
unit with slots such as IF, THEN, AverageRunningTime, TimeOfCreation,
BroaderRules, etc.  We might define the semantics
of the IfPotentiallyRelevant slot of rules to be (i) containing a
LISP predicate (ii) which is very fast to run (iii) which returns
NIL at least 95% of the time that the IF slot of that rule would
(iv) which returns T at least 30% of the time the IF slot would.
Empirical data could be gathered by the system, and if the user
seemed to be deviating from the intended spirit of that slot
he could be admonished to mend his ways.  
The semantics of the IF slot are more subtle; they refer to the
relevance of the rule to the current situation, to the ultimate
utility and efficacy of executing the rule.  One can still imagine
(and RLL contains) functions which gather the appropriate empirical
data, albeit quite crudely and incompletely.

Notice we are talking
not about the syntactic sort of type checking (``the slot must
be filled with a LISP predicate'') but rather about the semantics
of that slot, the meaning of the entry stored there, the manner
in which it interacts with the rest of the knowledge base.
EXAMPLE 3: The semantics of each part of the language can and should be
explicated.  This includes relevant data structures (e.g., ``The agenda
should be ordered roughly by the ultimate worth of the tasks on it''),
the units which define the control structure of the system
(e.g., ``The TopLevelControl unit should have its ToProcess slot evaluated
only once per run),
and the unit for Semantics slots (which has a Semantics slot, of course.
This will affect what gets done with the semantics statements; are they
to be enforced, occasionally checked when convenient, used for tutoring
or analyzing the user, or turned off completely as if they were prose
comments).

Neither of these goals (explicitness, enforcability) 
is attainable,
absolutely, in any real system.   Neverthelesss, AIMING at them
and getting ``part of the way'' can dramatically improve the
quality (the perceived helpfulness) of the environment.  
We are not alone in calling for the explication of the semantics
of the representation; see, for example, [...]   The suggestion
of actively monitoring the usage of the language, and checking for
misuse of the representation, is novel, and we shall focus on that
issue herein.

Specifically, we illustrate below how RLL (Representation
Language Language) represents the semantics of itelf, and how
it uses that to doublecheck that it is being used in accord
with those semantics.

Ideas involved:
Most of the individual slots can have their semantics stated
explicitly in terms of other slots', other slots' over time,
or (what is formally equivalent) some empirically measurable
quantity.

In cases where two knowledge bases are being merged, we assume
that slots present in he kernel RLL system have been used with
the same semantics by both users (we can of course verify that
the SEMANTICS slots of the corresponding units have not been
altered).  We further assume that any newly-named slots are
ideosyncratic, and that coincidences in naming (identical
overlap, substringing, and string partial matching) are
merely useful hints for places to look for semantic similarity.
Thus, if two users create a BestUser slot, the system would
(upon merging their knowledge bases) internally rename one to
be called BestUser1, and propose that the semantics of the
two slots be studied sometime, to see if they really do have
some correlation.  The problem of recognizing coidentifications
is open-ended; even in cases where full intensional specifications
have been given, the problem is no less than general theorem
proving.  In cases where the semantics is rule-defined or
partially specified, there may be no way to esblish any more
than the PLAUSIBILITY of the two slots coinciding semantically.

Finally, we start the system with a large base of slots ad
methods for creating new slots.  For each such method, we
provide instructions for how to construct the new description
fo the semantics of the new, synthesized slot.  Thus, the
well-meaning user is almost bound NOT to go astray, NOT to
diverge from other users.

Below, we illustrate each of these ideas as currently
implemented in RLL.


Below is part of the unit called Examples in RLL.  We also depict
Extremize, one of the methods for forming new slots out of old ones.
Each of them has a Semantics slot, which points to another unit
dealing exclusively with the semantics of each.  Furthermore,
Extremize contains a NewSemantics slot, indicating how to change
the Semantics slot of S, when it synthesizes a new slot, Extreme-S,
out of slot S.  We show how Extremize applies to Examples, thereby
creating the new kind of slot Exreme-Examples, and how the
Semantics slot of this new unit gets filled in automatically.
Of course, if you go playing with the Semantics slot of the
Semantics unit, don't come crying to me.
----- ALT: ----
We also depict Extreme, a relation, and Constrain, one of the
methods for forming new slots out of old ones.

Furthermore, Constrain contains a NewSemantics slot, indicating how to
change the Semantics slot of S (by adding the semantics of C),
when Constrain is called to synthesize a new slot C-S out of the old
slot S and the constraint C.  We show how Constrain applies to
Examples and Extreme, thereby creating the useful new slot
Extreme-Examples, and automatically filling in the Semantics slot
of that new unit.


Examples

	Isa		(Slot)
	MakesSenseFor	AnyClass
	SuperSlots	(AKindOf)
	SubSlots	(ExtremeExamples TypicalExamples)
	Semantics	(SEEUNIT . SemanticsOfExamples)

SemanticsOfExamples
	Isa		SlotOfSomeUnit
	value		(λ (Unit Entries) (EVERY Entries (Defn Unit)]
	ConstraintsOnEachEntry
	ConstraintsOnEachSetOfEntries
	ConstraintsOnCollectionOfAllSetsOfEntries
	ConstraintsOnChangesInEachEntryOverTime
	ConstraintsOnChangesInEachSetOfEntriesOverTime
	ConstraintsOnCollectionOfAllSetsOfEntriesOverTime



---


It is clear that at one extreme, the notion of enforced semantics
is nothing more than the syntactic type checking prevalent in
many languages and systems (e.g., PASCAL's strong typing).  What is
far less clear is where the boundary is between syntactic range
checking and semantic.  Let us consider IfPotentiallyRelevant once more.
Its conditions are that all entries be (i) Lisp predicates,
(ii) which run very quickly, and (iii) which correlate to the predicate
stored in the IF slot.  The first of these conditions is clearly
syntactic.  What about the second?  We could define a (virtual) slot
called AverageRunningTime, and keep updating it as the program ran.
We could keep this for both the IF and the IfPotentiallyRelevant 
slots, and even define another new slot IfPoten/IF which was their
ratio.
A simple range check on the value filled in there would then suffice
to trigger an alarm if the average running time of any unit's 
IfPotentiallyRelevant was NOT much smaller than the average running
time of its IF slot.

Next idea:
Each range constraint indicates what happens when it's violated.
Moreover, there are different kinds of range constraints;
thus, we might say that the range of PRIME-FACTORS is usually
a bag of size about log(n), and the largest one is usually
about sqrt(n), and it is always a bag of natural numbers,
and the probability of.... etc.  Some of these are "always", and
signify contradictions, mal-consensus, paradoxes, errors, bugs,...
if they are violated.  Some are "usually", and may even characterize
their violations (as may the former case, a la Hacker).  Some of those
may signify probable errors, or errors if they persist (or occur
widely over the system at one time), or potential interesting things
to scrutinize, or to be inducted upon into a new concept, etc.

Acknowledgements

I wish to thank Mike Genesereth, who first raised my consciousness
about semantics issues; Howie Schrobe, John Brown, and Johann
deKleer, who variously let me know my consciousness should get
raised; Greg Harris, who tried to let me know;
 Stan Rosenschein, who pushed me into conceiving Enforced
Semantics; Russ Greiner and Dave Smith, for helping bring RLL into
a running state; and finally Mark Stefik and Rick Hayes-Roth, for
providing some external sense of proportion and stability.